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Sir: 

T, MAURO MAGNANT Ph D , do declare as follows: 

| 1 I am currently Professor or biochemistry, Director of Centre of 
Biotechnology, Vice-Rector of the University of Urbino, Urbino, Italy I have over thirty 
years of experience as a biochemist in the research and development of products and 
applications useful in the biotechnology and pharmaceutical industries I am included in the 
official list of professional biologists in Italy with n 017484 "Ordine Nazionale Biologi," and 
I am a Technical Director nominated by the "Agen2ia Italiana del Farmaco, AIFA" with n 
AJDT-1 9/2005 My education and experience are summarized on my Curriculum Vitae, 
which is attached hereto as Exhibit 1 

| 2 I have collaborated with Dr Barbara Ensoli, who is the inventor of the above- 
identified application No 09/555,534 (hereinafter "the '534 application"), in the development 
iof methods to produce recombinant, biologically active Tat proteia i also supervise and have 
supervised the production of such biologically active Tat pr otein, according to good 
manufacturing practices (GMP), for use in human clinical trials Tn particular, I have 
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experience in reversed-phase high pressure liquid chromatography (RP-HPLC) purification 
of biologically active Tat protein, which my laboratory has carried out many times 

3. I have been asked to review Cm et al Sep Technol , 1 994, 4:258-260 ("Gu et 
J/ "), and to consider, based on my exper ience, whether the phase separation method 
described in Gu et al is applicable to removing acetonitrile fr om compositions comprising 
biologically active Tat protein purified by RP-HPLC For the reason discussed below, it is 
riiy judgement and opinion that the phase separation method of Gu el al. is not applicable to 
removing acetonitrile from compositions comprising biologically active Tat protein purified 
by RP-HPLC. 

4 Gu et al discloses removing acetonitrile from RP-HPL C effluent fractions by 
a phase separation method (see Abstract) In particular, Gu et al discloses that when a RP- 
HPLC effluent fraction containing 65% (vol ) acetonitrile/35% water/0 1% trifluoroacetic 
acid (TFA) is stored in a freezer at -17T for yeveral hours, a phase separation occurs such 
tfiat a top phase containing 88% (vol.) acetonitrile, and a bottom phase containing 65% (vol ) 
water and 99%+ of the human growth hormone (hGH) protein, are formed (see Abstract; and 
page 258, right column, first par agraph) Gu et al indicates that the phase separ ation "occurs 
only in the [acetonitrile] concentration range of 35-88%" (see page 259, right column, second 
paragraph, lines 3-4) (emphasis in original), and explains that the reason the hGH protein and 
its genetically engineered analog (H.GHG120R) stay in the bottom phase is probably due to 
their hydrophilicity (see page 260, sentence bridging left and right columns) 

5 In my experience, Tat proteins elute during RP-HPLC at acetonitrile 
concentrations between 25 and 30%, more precisely at acetonitrile concentrations between 28 
and 30% I have never observed any TV f «-i-rin to elute during RP-HPLC at an acetonitrile 
Concentration higher than 3 5% Therefore, it is my judgment and opinion that the phase 
separation method of Gu et al is not applicable to removing acetonitrile from compositions 
comprising biologically active Tat protein purified by RP-HPLC, since the Tat protein does 
rot elute at the required acetonitrile concentration range of 35 -88% during RP-HPLC. 

6 Attached as Exhibits 2-4 are hydrophilicity profiles of the Tat protein, hGH 
f rotein, and hGHG120R analog, respectively I have been informed that these pr ofiles were 
obtained using ProtScale (http;//www expasy org/toois/protscale html), which allows 
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computation and representation (in the form of a two-dimensional plot) of the hydrophilicity 



attached as Exhibit 5 T am informed that the sequence of the Tat protein used to generate its 
hydrophilicity profile was the sequence disclosed in the '534 application at page 3 7, lines 9- 
l'O I am also informed that the sequence of the hGH protein used to generate its 
hydrophilicity profile was the amino acid sequence having GenBanlc accession number 



in Gu et aL at page 258, left column, second paragraph, lines 1-2, and that the sequence of the 
hGHGUOR analog used to generate its hydrophilicity profile is the sequence of GenBank 
accession number AAA72260, except that the glycine (G) residue at position 121 is r eplaced 
yith arginine (R) Briefly, according to the method of Hopp & Woods, to generate the 
hydrophilicity profile of a protein, each amino acid residue in the sequence of the protein is 
alssigned its hydrophilicity value, then these values are repetitively averaged down the length 



df the polypeptide chain, generating a ser ies of local hydrophilicity values, which are 
averaged at each repetition in groups of (,*ce Hopp & Woods, page 1824, right column, 
fourth paragr aph, lines 1-7) As evidenced by the hydrophilicity profiles in Exhibits 2-4, the 
profile for the Tat protein differs markedly from the profiles for the hGH protein and 
WGHG120R analog, respectively, in that the profiles for the hGH protein and hGHG120R 
apalog contain many more high points of local hydrophilicity, dispersed along the length of 
the proteins, than are found in the profile for the Iat protein It is my judgment and opinion 
t iat, based on the differences in the hydrophilicity profiles between on the one hand, the hGH 
I rotein and hGHGl20R analog, and on the other hand, the Tat protein, one of ordinary skill 
h the ait would not expect that biologically active Tat protein would be preferentially present 
i l the bottom, pr edominantly water, phase of Gu et al ,'s phase separation method; rather , one 
would expect that much of the Tat protein would be removed along with the acetonitrile 
p hase or remain at the interface between the water phase and acetonitrile phase, 



profile of selected proteins based on the amino acid scale of Hopp and Woods, "Prediction of 
protein antigenic determinants from amino acid sequences," Proc Natl Acad Sci USA, 
June 1981, Vol 78, No 6, pp 3824-3828 ("Hopp & Woods"), a copy of which reference is 



AAA72260, obtained on December 1 7, 2008 from the online database of the National Center 
for Biotechnology Information (NCBI) at 
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1 7 I also have been asked to review U S Patent No 5,646,120 by Sumner-Smith 
etal ("Sumner-Smith etal") and to consider, based on my experience, whether an acid 
exchange reaction such as that described by Sumner-Smith et al could be used to exchange 
the TFA present in a composition with biologically active Iat protein after RP-HPLC, 
without loss of biological activity of the Tat protein For the reasons discussed below, it is 
my judgement and opinion that an acid exchange reaction such as that described by Sumner- 
Smith et al would be expected to destroy hiological activity of Tat protein purified by 



8 In my experience, a Tat piotein is completely denatured during RP-HPLC that 
occurs in a non- aqueous acetonitrile/TF A solvent If the Tat protein is lyophilized after RP- 
HPLC, most of the acetonitrile and TFA will be removed by the lyophilization, since 
acetonitrile and TFA are volatile; if the Iat protein is then suspended in a buffer compatible 
with the biological activity of the Tat protein (i e , in an aqueous, pH-buffered, neutral 
solvent), in this compatible solvent, the Tat protein regains its native conformation and 
biological activity However, if the biologically active Tat protein and remaining TFA after 
RP-HPLC elution, lyophilization, and resuspension is exchanged with another acid in an 
^aqueous solvent, such as by being subjtc* the acid exchange reaction disclo=cd in Sumner- 
Smith et al at col 9, lines 44-62, 1 would expect the three-dimensional conformation of the 
iTat protein to be damaged by the acid, which acid is chemically reactive in the aqueous 
solvent, and thus the Tat protein would be expected to lose biological activity Thus, in my 
judgement and opinion, the acid exchange reaction described by Sumner-Smith et al would 
(destroy the biological activity of HTV Tat protein purified by RP-HPLC 

< 9 I her eby declare that all statements made her ein of my own knowledge are tr ue 
land that all statements made on information and belief are believed to be true; and further that 
T make these statements with the knowledge that willful false statements and the like so made 
Jare punishable by fine or imprisonment, or both, under Section 1001 of Title 18 of the United 
iStates Code, and that such willful false :t:t;ir.ents may jeopardize the validity of rhts 
application, and any patent issuing thereon 




Date; January 3th. 2009 



Mauro Magnani 
Professor of Biochemistry 
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EXHIBIT 1 



Curriculum Vitae - Prof. Mauro Magnani 

MAGNANI Prof. Mauro, Ph.D. Italian, Professor of Biochemistry 
BORN: April 9, 1953, Italy 
LANGUAGES: Italian, English 
EDUCATION: Univ. Urbino, Italy, Ph.D., 1976 

PRIMARY POSITION: Professor of Biochemistry and Director Centre of Biotechnology. 
PROFESSIONAL CAREER: Visiting Researcher, Dept. Biochemistry, Univ. Birmingham, 
1980; Visiting Prof. Dept. Biolgy, Haifa, Israel, 1983; Asst. Prof. Univ. Urbino, 1977-82, 
Assoc. Prof., 1982-1986, Prof. 1986 - ; Dean, Faculty of Sciences University of Urbino 1995- 
2001; Director Interuniversity Consortium for Biotechnology (CIB) 1998-2004; Vice Rector 
of the University of Urbino 2001- .Include in the official list of professional biologist in Italy 
with n. 017484 "Ordine Nazionale Biologi". Technical Director nominated by the "Agenzia 
Italiana del Farmaco, AIFA" with n. AIDT- 19/2005. 

CURRENT RESEARCH: Development of new drug delivery and drug targeting systems; 
Protein turnover ubiquitination and regulation of gene expression; Mechanisms of drug 
resistance and drug toxicity; Modulation of NF-kB and gene expression by oligonucleotide 
decoys, vaccine development; nanobiotechnology in drug delivery. 

PUBLICATIONS: over 300 articles published in international refereed scientific journals; 
Co-editor of three books: 

"Red Blood Cell Aging", Plenum Press, N.Y., 1991, pp. 383. 

"The Use of Resealed Erythrocytes as Carriers and Bioreactors" , Plenum Press, N.Y., 1992, 
pp. 361. 

"Erythrocyte Engineering for Drug Delivery and Targeting", Landes Bioscience, 2002. 
REFEREE: Programmes of the E.U.; The International Science Foundation (U.S.A.); Target 
Project "Biotechnology" of the National Research Council (C.N.R.).; Member of the Project 
"Patologia clinica e terapia dell'infezione da HIV" of the Italian Ministry of Health; PRIN 
and FIRB Projects of Italian Ministry of University and Research; Member of Committee 
Post Genoma (C.N.R); Include in the "Albo degli Esperti" of M.I.U.R. and Eureka Projects of 
EU. 

REVIEWER: Biotechnology and Applied Biochemistry; Nature Biotechnology; Drugs; 
Leukemia; European Journal Haematology; Biochimica et Biophysica Acta; Blood; Journal of 
Cellular Engineering; Journal of Internal Medicine; Journal of Acquired Immune Deficiency 
Syndromes and Human Retrovirology; Mechanisms of Ageing and Development; Antiviral 
Research; Journal of Chromatography; Journal of Biological Regulators and Homeostatic 
Agents; Life Sciences; Biochemistry; International Journal of Biochemistry and Cell Biology; 
Human Gene Therapy; European Journal of Biochemistry; Clinical Pharmacokinetics; 
Autoimmunity; Oncogene; Haematologica; J. Controlled Release; Editorial Board: Current 
Drug Targets, Biotechnology. 

PATENTS 

Europen Patent EP 0517986B1 

M. Magnani, L. Rossi "Transformed erythrocytes, process for preparing the same, and their 
use in pharmaceutical compositions" 

US Patent 5,753,221 



C00021900.0007 



M. Magnani, L. Rossi "Transformed erythrocytes, process for preparing the same, and their 
use in pharmaceutical compositions" 

US Patent N. 6.139.836 

Mauro Magnani, Ivo Panzani, Leonardo Bigi, Andrea Zanella "Method of encapsulating 
biologically active agents within erythrocytes, and apparatus therefor". 
Assignee: Dideco S.p.A., Mirandola, Italy 

European Patent N. EP98830479.6 

M. Magnani, G. Brandi, A. Fraternale, A. Casabianca "Pharmaceutical composition or 
composition package containing a pyrimidine nucleoside analogue and a purine nucleoside 
analogue ". 

Brevetto C.N.R. N. RM92 A 000377 

M. Magnani "Antigeni legati alia superficie esterna di eritrociti e procedimento per la loro 
preparazione " 

Brevetto C.N.R. N. RM 93 A 000474 

M. Magnani "Eritrociti incorporanti alcool ossidasi e loro uso nelle intossicazioni da 
metanolo " 

Brevetto C.N.R. 

M. Magnani, L. Rossi, G. Brandi, E. Millo, G. Damonte, U. Benatti, A. De Flora 
"Profarmaco di acyclovir e suo uso in composizioni farmaceutiche " 

Brevetto di Invenzione N. MI2002A01 196 - 06/06/1996 - PCT/IT 02/00368 del 13/06/2002 
M. Magnani, C. Fiorucci, P. Filippone, G. Brandi, M. Paiardini. "Derivato tetramerico 
dell'indol-3 carbinolo adattivita anticancerogena e metodo di sintesi del derivato stesso". 

Brevetto di Invenzione N. TO2001 AO 1077 - 16/1 1/2001 

M. Magnani, F. Graziano, A. Ruzzo "Mutazioni della linea germinale nel promotore del gene 
della E-caderina e metodi di diagnosi per individuare una maggiore suscettibilitd al 
carcinoma gastrico ". 

Brevetto N. TO2003A001048 - 30/12/2003 - PCT/EP/2004/053726 - 29/12/2004 
U.Benatti, G. Brandi, E. Garaci, M. Magnani, E. Millo, A.T. Palamara, L. Rossi. "Derivati del 
glutathione e loro utilizzo per il trattamento di malattie virali ". 
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ProtScale analysis 



Page 1 of 2 



g|l ExPASy Home page Site Map Search ExPASy Contact us Proteomics tools Swiss-Prot 
Search | Swiss-Prot/TrEMBL 3 for | _Goj Clear j 



Please help us to better understand your needs and expectations regarding ExPASy and complete 
our online survey! 



ProtScale 



User-provided sequence: 

10 20 30 

MEPVDPRLEP WKHPGSQPKT ACTNCYCKKC 

70 80 
GSQTHQVSLS KQPTSQSRGD PTGPKE 

SEQUENCE LENGTH: 86 



40 50 60 

CFHCQVCFIT KALGISYGRK KRRQRRRPPQ 



Using the scale Hphob. / Hopp & Woods, the individual values for the 20 amino acids are: 



Ala: -0.500 Arg 

Glu: 3.000 Gly 

Met: -1.300 Phe 

Tyr: -2.300 Val : 



3.000 Asn: 0.200 

0.000 His: -0.500 

-2.500 Pro: 0.000 

-1.500 : 1.600 : 



Asp: 3.000 Cys: 
He: -1.800 Leu: 
Ser: 0.300 Thr: 
1.600 : -0.215 



-1.000 Gin: 0.200 
-1.800 Lys: 3.000 
-0.400 Trp: -3.400 



Weights for window positions 1,..,9, using linear weight variation model: 

123456789 
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 

edge center edge 

MIN: -1.289 
MAX: 2.689 



http://www.expasy.ch/cgi-bin/protscale.pl?! 
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ProtScale analysis 



Page 2 of 2 




The results of your ProtScale query are available in the following formats: 

• Image in GIF-format 

• Image in Postscript-format 

• Numerical format (verbose) 

• Numerical format (minimal, to be exported into an external application) 

ExPASy Home page Site Map Search ExPASy Contact us Proteomics tools Swiss-Prot 
Search | Swiss-Prot/TrEMBL 3 for I Go I Clear j 



Please help us to better understand your needs and expectations regarding ExPASy and complete 
our onl ine survey ! 



http://www.expasy.ch/cgi-bin/protscale.pl?! 
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ProtScale analysis 



Page 1 of 2 



ExPASy Home page Site Map Search ExPASy Contact us Proteomics tools Swiss-Prot 
Search j Swiss-Prot/TrEMBL 3 for I Go [ Clear j 



Please help us to better understand your needs and expectations regarding ExPASy and complete 
our online survey! 



ProtScale 

User-provided sequence: 

10 20 30 40 50 60 

MFPTIPLSRL FDNAMLRAHR LHQLAFDTYQ EFEEAYIPKE QKYSFLQNPQ TSLCFSESIP 

7Q 80 90 100 110 12 0 

TPSNREETQQ KSNLELLRIS LLLIQSWLEP VQFLRSVFAN SLVYGASDSN VYDLLKDLEE 

130 140 150 160 170 180 

GIQTLMGRLE DGSPRTGQIF KQTYSKFDTN SHNDDALLKN YGLLYCFRKD MDKVETFLRI 
190 

VQCRSVEGSC GF 

SEQUENCE LENGTH: 192 



Using the scale Hphob. / Hopp & Woods, the individual values for the 20 amino acids are: 

Ala: -0.500 Arg: 3.000 Asn: 0.200 Asp: 3.000 Cys : -1.000 Gin: 0.200 

Glu: 3.000 Gly: 0.000 His: -0.500 He: -1.800 Leu: -1.800 Lys : 3.000 

Met: -1.300 Phe : -2.500 Pro: 0.000 Ser: 0.300 Thr: -0.400 Trp: -3.400 

Tyr: -2.300 Val: -1.500 : 1.600 : 1.600 : -0.215 



Weights for window positions 1 ,..,9, using linear weight variation model: 

123456789 
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 

edge center edge 

MIN: -1.289 
MAX: 1.644 



http://www.expasy.ch/cgi-bin/protscale.pl?! 
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• I gi a e-in GIF format 

• Image in Postscript-format 

• Numerical format (verbose) 

• Numerical format (minimal, to be exported into an external application) 



I a b I e in t h 



& ExPASy Home page Site Map Search ExPASy Contact us Proteomics tools Swiss-Prot 
Search |swiss-Prot/TrEMBL 3 for I Jfoj Clear | 



Please help us to better understand your needs and expectations regarding ExPASy and complete 
our onl ine surv ey! 



http://www.expasy.ch/cgi-bin/protscale.pl?! 
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EXHIBIT 4 



ProtScale analysis 



Page 1 of 2 



til ExPASy Home page Site Map Search ExPASy Contact us Proteomics toots Swiss-Prot 
Search | Swiss-Prot/TrEMBL 3 for | jf°j Clear | 



Please help us to better understand your needs and expectations regarding ExPASy and complete 
our online survey! 



ProtScale 

User-provided sequence: 

10 20 30 40 50 60 

MFPTI PLSRL FDNAMLRAHR LHQLAFDTYQ EFEEAYIPKE QKYSFLQNPQ TSLCFSESIP 
7 0 8 0 90 100 110 12 0 

TPSNREETQQ KSNLELLRIS LLLIQSWLEP VQFLRSVFAN SLVYGASDSN VYDLLKDLEE 
130 140 150 160 170 180 

RIQTLMGRLE DGSPRTGQIF KQTYSKFDTN SHNDDALLKN YGLLYCFRKD MDKVETFLRI 
190 

VQCRSVEGSC GF 
SEQUENCE LENGTH: 192 



Using the scale Hphob. / Hopp & Woods, the individual values for the 20 amino acids are: 



Met : 
Tyr : 



-0.500 
3.000 
-1.300 
-2.300 



Arg: 
Gly: 
Phe: 
Val : 



3.000 
0 .000 
-2.500 
-1.500 



Asn: 
His : 
Pro : 



0 .200 
-0.500 
0.000 



Asp: 3.000 Cy 
He: -1.800 Le 
Ser: 0.300 Th 
1.600 : -0.215 



-1 . 000 

-1 . 800 



Gin: 0.200 
Lys: 3.000 
Trp: -3.400 



Weights for window positions 1,..,9, using linear weight variation model: 

123456789 
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 

edge center edge 



MIN: -1.289 
MAX: 1.644 



http://www.expasy.cli/cgi-bin/protscale.pl?! 
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The results of your ProtScale query are available in the following formats: 

• Image in GIF-format 

• Image in Postscript-format 

• Numerical format (verbose) 

• Numerical format (minimal, to be exported into an external application) 

ill ExPASy Home page Site Map Search ExPASy Contact us Proteomics tools . Swiss-Prot 
Search | Swiss-Prot/TrEMBL »J for I" jf£| Clear j 



Please help us to better understand your needs and expectations regarding ExPASy and complete 
ou r online survey! 



http://www.expasy.ch/cgi-bin/protscale.pl?! 



12/17/2008 



EXHIBIT 5 



Proc. Natl. Acad. Sci.VSA 

Vol. 78, No. 6, pp. 3824-3828, June 1981 

Immunology 

Prediction of protein antigenic determinants from amino 
acid sequences 

(hydrophilicity analysis/protein conformation) 

Thomas P. Hopp and Kenneth R. Woods 

The Lindiky F. Kimball Research Institute, The New York Blood Center, 310 East 67th Street, New York, New York 10021 
Communicated by Bruce Merrifield, March 2, 1981 



ABSTRACT A method is presented for locating protein anti- 
genic determinants by analyzing amino acid sequences in order 
to find the point of greatest local hydrophilicity. This is accom- 
plished by assigning each amino acid a numerical value (hydro- 
philicity value) and then repetitively averaging these values along 
the peptide chain. The point of highest local average hydrophilicity 
is invariably located in, or immediately adjacent to, an antigenic 
determinant. It was found that the prediction success rate de- 
pended on averaging group length, with hexapeptide averages 
yielding optimal results. The method was developed using 12 pro- 
teins for which extensive immunochemical analysis has been car- 
ried out and subsequently was used to predict antigenic deter- 
minants for the following proteins: hepatitis B surface antigen, 
influenza hemagglutinins, fowl plague virus hemagglutinin, hu- 
man histocompatibility antigen HLA-B7, human interferons, 
Escherichia coli and cholera enterotoxins, ragweed allergens Ra3 
and Ra5, and streptococcal M protein. The hepatitis B surface 
antigen sequence was synthesized by chemical means and was 
shown to have antigenic activity by radioimmunoassay. 



The elucidation of protein antigenic structures is presently a 
difficult, uncertain, and time-consuming task. To precisely de- 
lineate antigenic determinants, it is necessary to prepare a large 
number of well-characterized chemical derivatives and peptide 
fragments from the original protein antigen and then to test 
these derivatives for immunological activity (1, 2). Alterna- 
tively, a homologous series of proteins may be used to assess 
the influence of particular amino acid substitutions, thereby im- 
plicating certain regions as antigenic determinants (3, 4); this 
approach requires knowledge of complete primary structures 
for a number of proteins before the immunological results can 
be interpreted. Despite the laboriousness of available ap- 
proaches, the complete antigenic structures have been eluci- 
dated for a small number of proteins, and partial information 
is available for many others. 

As more information becomes available on protein antigens, 
it should be possible to use this information to predict the lo- 
cations of antigenic determinants before any immunological 
testing has been carried out. In recent years a number of sys- 
tems have been developed to predict protein conformational 
features from amino acid sequences (5-8), but none of these 
were specifically oriented to the prediction of antigenic deter- 
minants. Therefore, we sought a method that was not predi- 
cated upon predictions of particular structural features but 
rather sought a simple correlation with surface location of 
stretches of peptide chain and the likelihood of antibody bind- 
ing. A guiding principle was the notion that many surface ori- 
ented regions are nonantigenic (1). This led us to take an em- 
pirical approach in our analysis and to arbitrarily manipulate the 
emphasis placed on certain amino acids in order to find a par- 



The publication costs of this article were defrayed in part by page charge 
payment. This article must therefore be hereby marked "advertise- 
ment" in accordance with 18 U. S. C. §1734 solely to indicate this feet. 



ticular kind of sequence that is favored for antibody binding 
(which may not strictly depend on the hydrophilicity of the se- 
quence). The present report describes a system that uses a sim- 
plified method to successfully predict antigenic determinants, 
given the amino acid sequence of a protein and no other 
information. 

METHOD 

Previous investigations have demonstrated that antigenic de- 
terminants are surface features of proteins and indicate that they 
are frequently found on regions of a molecule that have an un- 
usually high degree of exposure to solvent — i. e. , regions which 
project into the medium (for reviews, see refs. 1 and 3). This, 
together with the fact that charged, hydrophilic amino acid side 
chains are common features of antigenic determinants, led us 
to investigate the possibility that at least some antigenic deter- 
minants might be associated with stretches of amino acid se- 
quence that contain a large number of charged and polar resi- 
dues and are lacking in large hydrophobic residues. A suitable 
means of methodically searching for such regions was found by 
combining a method like that of Chou and Fasman (5), in which 
numerical values for amino acids are repetitively averaged over 
the length of a polypeptide chain, with a set of values expressing 
the relative hydrophilicity of each amino acid. Suitable values 
were available in the solvent parameters assigned by Levitt (6), 
which are derivatives of the hydrophobicity values of Nozaki and 
Tanford (9). 

In Table 1 are listed the numerical values (hydrophilicity 
values) assigned to the 20 amino acids commonly found in pro- 
teins. In the first column, the values of Levitt (6) are listed, 
whereas the second column lists the values that were finally 
chosen for our hydrophilicity calculations. The values were gen- 
erally retained as expressed by Levitt; however, changes in the 
values for proline, asparatic acid, and glutamic acid improve the 
prediction results, as explained later. Hydrophilicity analysis 
of a protein is carried out by the following method. 

Each amino acid in the sequence of the protein is assigned 
its hydrophilicity value, then these values are repetitively av- 
eraged down the length of the polypeptide chain, generating 
a series of local hydrophilicity values. The number of hydro- 
philicity values that are averaged at each repetition is arbitrary, 
and we chose groups of six for our initial studies because this 
is the approximate size of an antigenic determinant (1, 10). Once 
the complete set of averaged values is obtained, the list is 
scanned to locate the highest value. According to the studies 
presented here, this high point will invariably lie within or be 
immediately adjacent to one of that protein's antigenic 
determinants. 

A useful way of recording the results of this analysis is to 
produce a plot of hydrophilicity value versus sequence position. 



Abbreviations: oAbu, a-aminobutyrk acid; HBsAg, hepatitis B surface 
antigen. 
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Immunology: Hopp and Woods 

Table 1. Hydrophilicity values 

Amino acid s,* kcal/mol Hydrophilicity value 



Arginine 


30 


30 


Aroartic acid 


25 


30 


Atomic acid 


25 


30 


L si 


30 


30 


Serine 


03 


03 


AjDaraeine 


0.2 


02 


Glutainine 




02 


Glycine 


0.0 


0.0 




-1.4 


0.0 


Threonine 


-0.4 


-0.4 


Alanine 


-0.5 


-0.5 


Histidine 


-0.5 


-0.5 




-1.0 


-1.0 


Methionine 


-1.3 




Valine 


-1.5 


-1.5 


Iaoleucine 


-1.8 


-1.8 




-1.8 


-1.8 


Tyrosine 


-2.3 


-2.3 


Phenylalanine 


-2.5 


-2.5 


Tryptophan 


-3.4 


-3.4 



* Solvent parameter values assigned by Levitt (6). 

Fig. 1, the hexapeptide analysis of sperm whale myoglobin, is 
illustrative. The high point of the profile, at position 60.5, Ms 
within myoglobin antigenic site 2 (1). Several findings which 
proved to be generally true with other proteins can be seen in 
the myoglobin plot. First, not all antigenic determinants are 
associated with high points of hydrophilicity (for example, an- 
tigenic site 4, residues 113 through 119); second, not all high 
points are associated with antigenic determinants (position 
79.5). The one correlation which has been upheld in myoglobin 
and the other proteins that we tested, is that one antigenic de- 
terminant is consistently located at the point of maximum 
hydrophilicity. 

Computerization. To facilitate the analysis of large quantities 
of sequence information, our procedure was encoded in a FOR- 
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TRAN program and run in a PDP 11/70 computer, and the re- 
sulting data was plotted with a Tektronix automatic plotting 
device. 

list of Antigenic Determinants. Proteins with known anti- 
genic determinants were considered to belong to one of two 
groups. Group 1, proteins whose antigenic structures are nearly 
or completely solved includes: (i) sperm whale myoglobin, with 
antigenic determinants at residues 15-22 (site 1), 56-62 (site 2), 
94-99 (site 3), 113-119 (site 4), and 145-151 (site 5) (1); (ii) 
chicken lysozyme, with antigenic determinants including res- 
idues 5, 7, 13, 14, 33, 34, 62, 87, 89, 93, 96, 97, 113, 114, 116, 
and 125 (2); (Hi) the ferredoxin from Clostridium pasteurianum, 
with antigenic determinants encompassing residues 1-7 and 51- 
55 (11); (w) horse heart cytochrome c, with antigenic residues 
at positions 47, 58-62, 88-92, and 96 (4, 12); and (t>) bovine 
myelin basic protein, with determinants in regions 64-73, 74- 
85, 113-121, and 153-166 (13, 14). Group 2, proteins for which 
partial information is available, comprises: (i) human hemoglo- 
bin & chains, with antigenic residues at 6, 16-23, 52, 68, 73, 
and 102 (3, 15, 16); (it) the tobacco mosaic virus (vulgare) coat 
protein, with antigenic determinants at positions 62-68, 108- 
113, and 153-158 (17-19); (iii) human IgG heavy chain constant 
regions (each of the three constant domains of the Eu myeloma 
protein was considered as an individual protein), with antigenic 
determinants localized to position 214 of the CHI domain, po- 
sitions 296 and 309 of the CH2 domain, and 355 to 358 of the 
CH3 domain (20); (iv) bovine a-lactalbumin, where antigenic 
determinants have been located within residues 10-18, 60-80, 
91-94, and 105-117 (unpublished data); and (v) leghemoglobin 
a from the soybean, with antigenic sites within residues 15-23, 
52-59, 92-98, 107-116, and 132-142 (21). 

Evaluating Predictions. An antigenic determinant was con- 
sidered to be correctly identified by a prediction point if that 
point fell within the determinant, directly on a single antigenic 
residue, or within two residues (inclusive) on either side of any 
antigenic residue. This inclusion of a two residue "buffer zone" 
around antigenic sites is acceptable because much of the avail- 
able information implicates single residues as antigenic sites, 
although in most cases these residues probably comprise part 




-2 I 1 1 1 1 1 1 1 1 I I I I I I , 

20 40 60 80 100 120 140 

Sequence position 

FlG. 1. Hexapeptide profile of sperm whale myoglobin. The averaged antigenicity values are plotted versus position along the amino acid se- 
quence. The x axis contains 153 increments, each representing an amino acid in the sequence of myoglobin. They axis represents the range of hy- 
drophilicity values (from 3 to -3.4). The data points are plotted at the center of the averaging group from which they were derived, m, Known 

antigenic determinants of myoglobin; , profile obtained by assigning the "solvent parameter" values of Levitt (6) to each amino acid; , profile 

obtained when the values for aspartic acid and glutamic acid were raised to 3.0; , profile obtained when proline was assigned the value of 0. 
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Table 2. Prediction success with hexapeptide maximum point 
Correct Wrong Unknown 
Original values 7 14 
Asp, Glu = 3 8 0 4 
Asp, Glu <• 3, Pro - 0 10 0 2 



of a larger site that includes several residues immediately ad- 
jacent to them in the sequence. Furthermore, any experiments 
designed to test the validity of these antigenic determinant pre- 
dictions would be expected to include a number of residues on 
either side of the predicted point, in which case an overlap with 
the antigenic determinant would always be guaranteed. 

Owing to the limited information available on the antigenic 
structures of some of the proteins used in this study, it was not 
always possible to definitely assess the correctness of a given 
prediction. Therefore, the proteins of groups 1 and 2 were 
treated differently in generating the information shown in Ta- 
bles 2 and 3. 

For group 1 proteins, a prediction was considered correct 
(Tables 2 and 3, column 1) if it successfully located an antigenic 
determinant or wrong (column 2) if it missed. With group 2 
proteins, however, it is possible that a predicted point that 
misses known antigenic determinants may be indicating an an- 
tigenic determinant that is currently undiscovered. Therefore, 
for these proteins, predictions were considered to be correct 
(column 1) if they hit a known determinant or unknown if they 
missed (because they may yet prove to be hits). 

Adjustment of Aspartk Acid, Glutamic Acid, and Proline 
Values. Table 2 shows the effect of increasing the values for 
these three amino acids from the original values given by Levitt. 
Increasing aspartic acid and glutamic acid from 2.5 to 3.0 elim- 
inated the one wrong prediction and caused an elevation of the 
plots in many regions where antigenic determinants are known 
to exist (e.g., myoglobin sites 1, 2, and 5). There was no change 
in the number of unknown predicted points in the group 2 pro- 
teins (column 3), although the new values tended to elevate the 
profiles in the locations of the known determinants in these 
proteins. Next, the value for proline was raised to zero, and the 
hexapeptide analyses were repeated. The result is shown in line 
3 of Table 2: two of the proteins that had given unknown pre- 
dicted points now resulted in correct predictions. 

The two remaining proteins with unknown prediction points 
are unusual, and it may not be worthwhile to attempt to bring 
them into the "correct" group by making further changes in 
amino acid values. For one of the two, the CH2 region of IgG, 
only 2 out of 109 residues are presendy known to be antigenical- 
ly active, and it may be possible that the predicted point may 
be indicating an undiscovered antigenic determinant. In the 
case of leghemoglobin a, the investigators indicate that they 

Table 3. Effect of averaging group length on predictions by the 



Correct Wrong Unknown C/C+W, %* 



Dipeptide 


23 


17 


18 


68 


Tripeptide 


10 


5 


3 


67 


Tetrapeptide 


9 


3 




76 


Penta peptide 


8 


2 


2 


80 


Hexapeptide 


10 


0 


2 


100 


Heptapeptide 


7 




2 


70 


Octapeptide 


6 


2 




76 


Nona peptide 


5 






63 


Decapeptide 


5 


2 


5 


71 



* Percentage of correct assignments when considering only proteins of 
group 1. C, correct; W, wrong. 
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have not tested the antigenic activity of the predicted region 
(21). 

The Effect of Averaging Group Length. When only two 
amino acids are averaged at a time, the data plot is erratic, and 
the great variation in hydrophilicity over short lengths of pep- 
tide tends to obscure the general trend of the values. In addi- 
tion, dipeptide analysis results in multiple identical high points 
because any pair of charged residues will yield the maximum 
value of 3.0. The results of this can be seen in line one of Table 
3. For the 12 proteins analyzed, a total of 58 identical high points 
were obtained, and only 23 of these were associated with known 
antigenic determinants. Moreover, dipeptide analysis resulted 
in 17 wrong predictions. 

Multiple identical high points continues to be a problem for 
tri- and tetrapeptide analysis; it finally disappears at the pen- 
tapeptide level (and higher), yielding a single predicted point 
for each of the 12 proteins. Although it is attractive to consider 
a method like the di-, tri-, or tetrapeptide analysis, which can 
predict more than one determinant per molecule, it seems more 
important to eliminate as many wrong predictions as possible 
because they reduce confidence in any given predicted anti- 
genic determinant. As averaging group length increases, the 
number of wrong predictions decreases to a minimum of zero 
for hexapeptide analysis (Table 3). Comparison of data plots for 
various averaging group lengths suggested a reason for this. In 
going from di- to terra- to hexapeptide analysis, the plots be- 
came less chaotic and the local hydrophilicity trend became 
more apparent. In going from hexa- to octa- to decapeptide anal- 
ysis, the plots became even smoother. However, wrong pre- 
dictions appeared again, and there was an increase in unknown 
predictions, whereas correct predictions fell from 10 to a low 
of 5 for nona- and decapeptide analysis. The reason for this may 
be that the regions of high hydrophilicity that are recognized 
well by the hexapeptide analysis begin to be obscured when 
longer averaging groups were used, due to their being com- 
bined with adjacent regions of low hydrophilicity. 

Second and Third Highest Points. In order to assess the gen- 
erality of the predictive value of high points, the success of the 
second and third highest points was considered. These points 
were only selected from the subset of points that had at least 
three amino acid positions between them and the highest (or 
second highest) point. This resulted in the second and third 
highest points always occurring in their own individual peak of 
hydrophilicity and the elimination of redundant prediction of 
antigenic determinants. However, neither the second nor the 
third highest points gave highly reliable prediction results. Al- 
though the correlation of predicted points with antigenic de- 
terminants seems to be significant in both cases (25% for the 
second and 33% for the third), the number of wrong predictions 
(33% in each case) severely limits the usefulness of these points 
for prediction of antigenic determinants of unknown proteins. 
These points are probably worthy of consideration in cases 
where immunochemical testing is used to verify the predictions 
because (by ignoring unknown predictions) they represent a 
43% and 50% chance of a correct prediction, respectively. 

Predictions for Uncharacterized Protein Antigens. We have 
applied our procedure to a number of proteins for which the 
location of an antigenic determinant may be of particular in- 
terest (Table 4). Several of the sequences listed in Table 4 are 
longer than six amino acids. In those cases, there are two or 
more adjacent sets of amino acids that result in identical average 
hydrophilicity values. Synthesis of short peptides should verify 
that these sequences are in, or immediately adjacent to, anti- 
genic determinants. 

To this end, we have recently used the Merrifield procedure 
to synthesize a peptide having the sequence aAbu-otAbu-Thr- 
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Table 4. Protein sequences with greatest average hydrophilicity* 



Protein 


Sequence 


HBsAg (22) 


141-Lys-Pro-Thr-Asp-Gly-Asn 


Influenza hemagglutinins 
A/Victoria/3/75 strain (23) 


171-Asn-Asp-Afln-Ser-Asp-Lys 


A/Aichi/2/68 strain (24) 


88-Val-Glu-Arg-Ser-Lys-Ak 


Fowl plague virus hemagglu- 




tinin (26) 


97-Glu-Arg-Arg-Qlu-Gly-Asn 


Human histocompatibility 




antigen HLA-B7 (26) 


43-Pro-Arg-Glu-Glu-Pro-Arg 


Human interferons 




Fibroblast (27) 


103-Glu-Glu-Lys-Leu-Glu-Lys- 




Glu-Asp 


Leukocyte I (28) 


160-Glu-Arg-Leu-Arg-Arg-Lys- 




Glu 


Leukocyte A (29) 


131-Lys-Glu-Lys-Lys-Tyr-Ser 


E. coli entero toxins 




Heat labile (30) 


66-Glu-Arg-Me^Lys-Asp-Thr 


Heat stable OlXtwo identical 


26-Asp-Ser-Ser-Lys-Glu-Lys 


peaks) 


46-Ser-Glu-Lys-Lys-Ser-Glu 


Cholera toxin B chain (32) 


79-Glu-Ala-Lys-Val-Glu-Lys 


Streptococcal M protein (33) 


58-Arg-Lys-Ala-Asp-Leu-Glu-Lys 


Ragweed allergens 




Ra3(34) 


88-Cys-Thr-Lys-Asp-Gln-Lys 


Ra5(35) 


40-Ser-Lys-Lyg-Cys-Gly-Lys 


Semliki Forest virus membrane 




proteins (36) 




El 


70-Thr-Lys-Glu-Lys-Pro-Asp 


E2 


246-Asp-Glu-Pro-Ala-Arg-Lys 


E3 


40-Glu-ABD-Asn-Val-Aflp-Arg 



* For each protein listed, the sequence of amino acids having the great- 
est average hydrophilicity value is shown; the number before the se- 
quence indicates the position of the first amino acid in the group. 



Lys-Pro-Thr-Asp-GIy-Asn-aAbu-Thr-aAbu (aAbu = a-amino 
butyric acid, replacing Cys) corresponding to residues 138-149 
of the hepatitis B surface antigen (HBsAg) protein, and tested 
it for antigenic activity. The peptide side chains were depro- 
tected under conditions where the peptide remained attached 
to the polystyrene beads (21). The peptidyl beads were then 
used to replace the polystyrene beads normally used in the 
Ausria II radioimmunoassay for HBsAg (Abbott), yielding a 
clearly positive binding affinity for 125 I-labeled anti-HbsAg an- 
tibodies. Beads without peptide, or peptidyl beads in which the 
side chain protecting groups had not been removed, did not 
bind significant I25 I-labeled anti-HBsAg antibody. Details of 
these experiments will be published elsewhere. 

DISCUSSION 

The studies described demonstrate the usefulness and limita- 
tions of antigenic determinant prediction by hydrophilicity 
analysis. The peak hexapeptide prediction value is highly suc- 
cessful, yielding no wrong assignments in 12 proteins; only lack 
of information on 2 of the 12 proteins makes it unclear whether 
the present method has a 100% success rate. On the other hand, 
the second and third highest peaks result in a mixture of correct 
and incorrect assignments and therefore, are less useful as pre- 
dictors of antigenic determinants. It is clear by inspection of the 
data plots that some antigenic determinants are not correlated 
with hydrophilicity, although there does seem to be a corre- 
lation of many antigenic determinants with local upspikes of the 
hydrophilicity profile. This suggests that our present method 
may be a good basis on which to superimpose other types of 
information that boost the values of these low peaks. For ex- 
ample, it may be possible to improve prediction success by con- 
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sidering currently available methods for predicting secondary 
structure, particularly /3 bends. 

Our method bears some resemblance to the procedure re- 
ported by Rose and Roy for predicting protein packing by hy- 
drophobicity analysis (8), but it also has distinct differences that 
make it a better system for locating antigenic determinants. 
Because their approach utilizes the hydrophobiciry values of 
Nozalti and Tanford (9) without the adjustments introduced by 
Levitt (6), the values for all hydrophilic amino acids are identical 
(i.e., 0), whereas the corresponding values used in our proce- 
dure range from 0.2 to 3.0. This results in a strong influence 
by the charged amino acids and an intermediate effect for neu- 
tral polar amino acids. Furthermore, Rose and Roy use a least- 
squares fitting of data to a quadratic polynomial with a seven- 
point moving window rather than hexapeptide averaging. This 
results in greater smoothing of the profile and end effects. Both 
of these qualities seem to decrease the potential usefulness for 
antigenic determinant prediction. In contrast, our method de- 
pends upon simpler calculations and a shorter averaging-group 
length and is capable of considering all amino acids from the 
amino-terminal to the carboxyl-terminal residue. 

Finally, it should be emphasized that the ability to predict 
antigenic determinants from amino acid sequence data alone is 
potentially very useful, even though only a single determinant 
can be predicted with confidence for any given molecule. For 
example, many proteins whose antigenic structures are of in- 
terest are not available in quantities sufficient to allow conven- 
tional immunochemical studies to be carried out, as is the case 
with many of the proteins for which we listed predictions in the 
preceding section. Increasingly, amino acid sequence infor- 
mation for such proteins is being obtained by microchemical 
methods or by nucleotide sequence analysis, so that sufficient 
material for conventional immunochemical analysis is never 
available. However, once an antigenic determinant has been 
predicted, it should be possible to verify its existence by syn- 
thesizing the indicated region chemically and testing its activity 
in an appropriate immune assay, such as inhibition of cytotox- 
icity or precipitation inhibition. Furthermore, it should be pos- 
sible to raise antisera against such synthetic determinants, as 
Arnon et al. have done for a bacteriophage (37). Ultimately, 
predicted antigenic determinants from proteins of pathogenic 
organisms might be useful in the production of synthetic 
vaccines. 
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□ 1: AAA72260. Reports human growth horm...[gi: 208528] 

Comment Features *.ence 

LOCUS AAA72260 



DEFINITION 

ACCESSION 

VERSION 

DBSOURCE 

KEYWORDS 

SOURCE 

ORGANISM 

REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
PUBMED 
COMMENT 



FEATURES 

source 



192 aa 
human growth hormone. 
AAA72260 

AAA72260.1 GI:208528 

locus SYNHUMGHS accession K02382 . 



BLink, Conserved 
Domains, Links 



SYN 27-APR-1993 



synthetic construct 

synthetic c onstruct 

other sequences; artificial sequences. 
1 (residues 1 to 192) 

Ikehara,M., Ohtsuka,E., Tokunaga,T., Taniyama, Y . 0 . , Iwai,S., 
Kitano,K., Miyamoto, S., Ohgi,T., Sakuragawa, Y . , Fujiyama,K., 
Ikari,T., Kobayashi, M. , Miyake,T., Shibahara, S . , Ono,A., Ueda,T., 
Tanaka,T., Baba,H., Miki,T., Sakurai,A., Oishi,T., Chisaka,0. and 
Matsubara, K. 

Synthesis of a gene for human growth hormone and its expression in 
Escherichia coli 

Proc. Natl. Acad. Sci. U.S.A. 81 (19), 5956-5960 (1984) 

§0M1ZA 

[1] synthesized this gene using the phosphotriester method with 
frequently occuring amino acid codons of E. coli. When the gene 
was inserted into an E. coli plasmid used to transform E. coli 
cells, a polypeptide identical to natural human growth hormone was 
produced . 

Method: conceptual translation. 

Location/ Qualifiers 
1. .192 

/ organ ism=" synthetic construct" 
/ db_xref ="taxon : 3 2 6 3 0 " 
i 1..192 

/name="human growth hormone" 
5 . . 190 

/ region_name="Hormone_l " 

/note="Somatotropin hormone family; pfam00103" 
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181 vqcrsvegsc gf 
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